Action-Aware Embedding Enhancement for Image-Text Retrieval

نویسندگان

چکیده

Image-text retrieval plays a central role in bridging vision and language, which aims to reduce the semantic discrepancy between images texts. Most of existing works rely on refined words objects representation through data-oriented method capture word-object cooccurrence. Such approaches are prone ignore asymmetric action relation texts, that is, text has explicit (i.e., verb phrase) while image only contains implicit information. In this paper, we propose Action-aware Memory-Enhanced embedding (AME) for image-text retrieval, emphasize information when mapping texts into shared space. Specifically, integrate prediction along with an action-aware memory bank enrich features action-similar features. The effectiveness our proposed AME is verified by comprehensive experimental results two benchmark datasets.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Conditional Image-Text Embedding Networks

This paper presents an approach for grounding phrases in images which jointly learns multiple text-conditioned embeddings in a single end-to-end model. In order to differentiate text phrases into semantically distinct subspaces, we propose a concept weight branch that automatically assigns phrases to embeddings, whereas prior works predefine such assignments. Our proposed solution simplifies th...

متن کامل

Content Aware Image Enhancement

We present our approach, integrating imaging and vision, for content-aware enhancement and processing of digital photographs. The overall quality of images is improved by a modular procedure automatically driven by the image class and content.

متن کامل

Image retrieval using the combination of text-based and content-based algorithms

Image retrieval is an important research field which has received great attention in the last decades. In this paper, we present an approach for the image retrieval based on the combination of text-based and content-based features. For text-based features, keywords and for content-based features, color and texture features have been used. Query in this system contains some keywords and an input...

متن کامل

Image-Based Document Vectors for Text Retrieval

We propose a method for constructing a vector for a document image to represent its content to facilitate text retrieval. The method is based on an N-Gram algorithm for text similarity measure based on the frequency of occurrence of n-character strings appearing in the electronic text. Instead of using ASCII values, the present study investigates the use of character images to obtain the docume...

متن کامل

Dual-Path Convolutional Image-Text Embedding

This paper considers the task of matching images and sentences. The challenge consists in discriminatively embedding the two modalities onto a shared visual-textual space. Existing work in this field largely uses Recurrent Neural Networks (RNN) for text feature learning and employs off-the-shelf Convolutional Neural Networks (CNN) for image feature extraction. Our system, in comparison, differs...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2022

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v36i2.20020